Lightning Fast and Space Efficient Inequality Joins
نویسندگان
چکیده
Inequality joins, which join relational tables on inequality conditions, are used in various applications. While there have been a wide range of optimization methods for joins in database systems, from algorithms such as sort-merge join and band join, to various indices such as B-tree, R⇤-tree and Bitmap, inequality joins have received little attention and queries containing such joins are usually very slow. In this paper, we introduce fast inequality join algorithms. We put columns to be joined in sorted arrays and we use permutation arrays to encode positions of tuples in one sorted array w.r.t. the other sorted array. In contrast to sort-merge join, we use space e cient bit-arrays that enable optimizations, such as Bloom filter indices, for fast computation of the join results. We have implemented a centralized version of these algorithms on top of PostgreSQL, and a distributed version on top of Spark SQL. We have compared against well known optimization techniques for inequality joins and show that our solution is more scalable and several orders of magnitude faster. 1. ONCE UPON A TIME . . . Bob, a data analyst working for an international provider of cloud services, wanted to analyze revenue and utilization trends from di↵erent regions. In particular, he wanted to find out all those transactions from the West-Coast that last longer and produce smaller revenues than any transaction in the East-Coast. In other words, he was looking for any customer from the West-Coast who rented a virtual machine for more hours than any customer from the East-Coast, but who paid less. Figure 1 illustrates a data instance for both tables. He wrote the following join query for such a task:
منابع مشابه
Errata for "Lightning Fast and Space Efficient Inequality Joins" (PVLDB 8(13): 2074-2085)
This is in response to recent feedback from some readers, which requires some clarifications regarding our IEJoin algorithm published in [1]. The feedback revolves around four points: (1) a typo in our illustrating example of the join process; (2) a naming error for the index used by our algorithm to improve the bit array scan; (3) the sort order used in our algorithms; and (4) a missing explan...
متن کاملVHF Lightning Observations by Digital Interferometry from ISS / JEM-GLIMS
Global Lightning and sprIte MeasurementS (GLIMS) mission is now ongoing on Exposed Facility of Japanese Experiment Module (JEM-EF) of the International Space Station (ISS). This paper focuses on an electromagnetic (EM) payload of JEM-GLIMS mission, very high frequency (VHF) broadband digital InTerFerometer (VITF). JEM-GLIMS mission is designed to conduct comprehensive observations with both EM ...
متن کاملApplication of Intelligent Water Drops in Transient Analysis of Single Conductor Overhead Lines Terminated to Grid-Grounded Arrester under Direct Lightning Strikes
In this paper, Intelligent water drop algorithm (IWD) is used to analyze single overhead line connected to grid-grounded arrester. In this approach, at first Norton’s equivalent circuit of the overhead line over lossy soil is computed by method of moments (MoM) and then for the problem under consideration, a nonlinear equivalent circuit in the frequency domain is proposed. Finally applying inte...
متن کاملEfficient Evaluation of the Valid-Time Natural Join
Joins are arguably the most important relational operators. Poor implementations are tantamount to computing the Cartesian product of the input relations. In a temporal database, the problem is more acute for two reasons. First, conventional techniques are designed for the optimization of joins with equality predicates, rather than the inequality predicates prevalent in valid-time queries. Seco...
متن کاملProcessing Inequality Queries
Bernstein and Goodman showed that natural inequality ( NI) queries can be processed efficiently by semijoins, if there are no multiple inequality join edges, nor cycles with one or zero doublet. In this paper procedures to hand1 e these cases efficiently are given. Multiple inequality join edges can be processed by multi-attribute inequality semijoins. Two procedures based on generalized semi-j...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 8 شماره
صفحات -
تاریخ انتشار 2015